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Abstract . 

The  estimates  of  percentage  points  of  a  distribution,  obtained  from 
one  large  sample  of  Monte  Carlo  values,  are  compared  with  those  obtained 
by  dividing  the  large  sample  into  several  subsamples  and  taking  the 
average.  The  single-sample  method  appears  to  be  preferable. 
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ON  THE  ACCURACY  OF  SIMULATED  PERCENTAGE  POINTS 


By 

J.M.  Juritz,  J.W.F.  Juritz  and  M.A.  Stephens 
University  of  Cape  Town  and  Simon  Fraser  University 

1.  Introduction. 

In  recent  years  Monte  Carlo  simulation  methods  have  been  used  more 
and  more  frequently  to  produce  tables  of  percentage  points  of  probability 
distributions  when  mathematical  difficulties  prevent  an  analytic  solution 
to  the  problem.  Schafer  (1974)  has  outlined  possible  sources  of  error 
and  presented  a  method  for  calculating  a  confidence  interval  for  a  per¬ 
centile.  This  is  based  on  several  Monte  Carlo  runs.  This  note  is  a  revision  | 

of  a  previous  manuscript  (Juritz,  Juritz,  and  Stephens,  1978).  In  it  we 
present  an  alternative  method  based  on  the  order  statistics  in  one  run, 
which  has  certain  advantages  which  we  discuss  below.  We  illustrate  the 
two  methods  with  simulations  for  percentage  points  of  the  normal  distribution. 

2.  Confidence  Intervals  Using  Order  Statistics. 

Let  f(x)  be  the  probability  density  function  of  a  continuous  random 
variable  X  and  let  denote  the  pt^1  percentile  of  X.  Suppose  a  random 

sample  of  size  n  is  drawn  from  f(x)  and  let  be  the  i**1  order 

A.U 

statistic,  l.e.  the  1  smallest  observation.  David  (1970)  shows  that 
for  any  continuous  random  variable  the  random  Interval  (X^,X^)  with 
r  <  s  covers  with  probability  ir(r,s,n,p)  given  by 


(2.1) 


ir(r,s,n,p) 

Y  (j)pid-p)n“i . 

i-r 


1 


This  probability  does  not  depend  on  f(x),  and  so  is  very  suitable 
for  use  in  simulation  studies,  where  f(x)  is  always  unknown.  In 


this  application,  n  is  the  number  of  repetitions,  p  *  Prob(X  £  £p) 
and  r  and  s  must  be  chosen  so  that  the  interval  (X(r)»  X(s)^  ^as 

the  desired  confidence  level  1-a;  i.e.  we  choose  r  and  s  so  that 
7r(r,s,n,p)  *  1-a. 

Since  in  a  simulation  n  is  very  large  indeed  the  value  of  Tr(r,s,n,p) 
can  be  evaluated  using  the  normal  approximation  to  the  binomial  distribution, 
and  r  and  s  can  then  be  found.  Let  z^_a/2  be  ** 100 (1-a/ 2) ^  per¬ 

centile  of  the  standard  normal  distribution.  Then 

1/2 

r  -  -Zi_a/2(np(l-p) )  +  np  +  1/2  and 

(2-2) 

1/2 

s  *  Si_a/2(np(l-p)>  +  np  +  1/2  . 

The  expressions  for  both  r  and  s  end  in  (+  1/2)  because  the  sum  in  (2.1) 
runs  from  r  to  s-1,  not  to  s.  Let  k  *  [np]+l,  where  [np]  denotes  the 
largest  integer  less  than  or  equal  to  np,  and  define 


(2.3) 


k(k) 


is  the  usual  estimate  of  £p  and  the  interval  L  =  (X^,X^j) 
gives  the  confidence  interval  for  £  ,  at  confidence  level  100(l-a)%,  since 
it  covers  £p  with  probability  1-a  (approximately,  but  to  high  accuracy). 

We  shall  refer  to  the  above  as  the  single-sample  method.  Schafer  (1974) 
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proposes  an  alternative  method  based  on  the  fact  that  is  asymptotically 

normally  distributed  with  mean  and  variance 


a2  -  (EfeBl)(f(E  )) 
n  p 


-2 


Instead  of  a  single  run  with  n  repetitions*  c  runs  each  with  m  repeti¬ 


tions  (n  »  cm)  are  made.  For  each  run  the  sample  percentile  X 


([mp]+l) 


estimates  £  .  Denoting  these  estimates  by  X  ,,X  X  ,  the  per- 

p  P1  pc 


cent lie  is  estimated  by  the  average 


(2.4) 


x-M  xf 

P  c  pi 


and  the  sample  variance 


(2.5) 


s2  ■  ii  <w2/<c-» 


estimates  0  .  Then 


V'W!  “c'1/2-  V*i-«/2  sc'1/2) 


is  a  confidence  interval  for  5^  with  confidence  probability  l-a> 
where  ti_a/2  is  the  uPPcr  100(l-a/2)th  percentile  of  Student's  t 
with  c-1  degrees  of  freedom.  We  shall  call  this  the  multi-sample 
method.  The  number  of  runs  must  be  chosen  to  give  a  good  estimate 
of  the  standard  deviation. 
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.  Comparison  of  the  Two  Methods. 

3.1.  Value  of  the  estimated  percentage  point. 

Since  in  practice  everyone  who  does  a  Monte  Carlo  study  for  a  table 

A 

of  percentage  points  gives  a  single  value  X  which  is  the  "best1'  estimate 

P 

of  the  question  arises  as  to  how  this  should  be  done.  Is  it  best  to 

make  c  runs  and  then  take  the  mean  or  make  one  large  run,  then  take 

Xp  as  the  point  estimate?  In  both  cases  the  point  estimates  are  biased 
in  finite  samples.  David  (1970,  page  65)  gives  expansions  for  the  expected 
value  and  variance  of  the  k^  order  statistic  in  a  sample  of  size  n  in 
terms  of  the  inverse  cumulative  distribution  function,  Q(x),  of  the  parent 
population  and  its  derivatives. 

Let  p^  *  k/ (n+1) ,  *  1-p^*  and  ^et  be  the  it^1  derivative 

of  Q(  • )  evaluated  at  x.  Then 

(3.1)  E«(t))  -  ,(pk)  ♦  ,<2>(pk)  *  . 


For  with  k  *  [np]+l,  to  be  an  unbiased  estimator  of  we 

need  E(X^))  "  ^p  *  Q(p)*  For  certain  values  of  n  and  p,  Q(p^) 
will  equal  Q(p),  and  the  bias  in  X^  *s  8*ven  by  the  second  and 
subsequent  terms  in  (3.1).  This  would  be  so,  for  example  if  n  *  99, 
p  -  .8;  then  k  *  [np]+l  -  80  and  the  first  term  of  (3.1)  is 


Q(80/100)  ■  Q(.8)  ■  5  j  as  required.  However,  in  general  this  will 

not  be  so,  and  to  Investigate  the  bias  we  expand  Q(p^)  about  Q(p) 
(2)  (2) 

and  Q  (p^)  about  Q  (p),  using  Taylor’s  series.  Then  (3.1) 


becomes 


+2*(n+2T  +  (Pk“P)Q^3^(p)  •••  +  •••  • 

Since  p^-p  =  ^+|~^  -  p  *  where  0  £  e  £  1,  we  have,  to  terms 

of  order  1/n, 

(3.2)  E«(l)>  -  Q(p)  +  ^  Q(1)(P)  +  Q(2)w  • 

When  several  samples  are  used,  the  estimate  of  £  is  X,0v,  the  mean 

p 

of  the  c  values  of  X^,  where  %  is  fmp]+l.  Then 

(2) 

E«W))  -  E<X(Jt))  -  Q(p)  *  QU)(p)  +  ■>(12-P>;2)  (P> 

to  order  1/m,  where  again  0  <  £  <  1. 

The  bias  in  using  the  single-sample  estimate  is  therefore  reduced 
by  a  factor  of  approximately  c. 


3.2.  Variance  of  estimated  percentage  points,  and  length  of  confidence 
intervals. 

Also  from  David  (1970,  page  65)  we  have,  for  the  single  sample  of 
size  n: 

Var(X(k))  -  {Q(1)(pk))2  +  terms  in  l/(n+2)2  . 


Thus  to  order  1/n, 


Var(X^)  *  ^n+2*^  (Q^(p))  *  an^  the  variance  of  X^  is 

V"°W  ■  f(S«  fQ(1)(p»2 


to  order  1/m,  For  large  samples  the  ratio  of  these  variances  is 

The  expected  length  of  the  confidence  interval  for  £  ,  given  by  X 

-1/2 

will  be  approximately,  to  order  n  , 


1. 

P 


(3.3) 


Si 


=  2z 


l-a/2 


f£il^,1/2  Q«>(p) 


the  expected  length  of  L  m  (X^,X^),  using  (3.1)  with  (2,1)  and 
(2.2)  will  be  approximately 


2z 


l-a/2 


( 


£i^El)l/2Q(1)(p) 


Thus  the  lengths  of  the  confidence  intervals  given  by  the  two  methods  can 

-1/2 

be  expected  to  be  roughly  the  same  size,  diminishing  as  n  .  There 

is  of  course  a  differnce  in  the  nature  of  the  confidence  intervals:  the 

interval  obtained  by  the  multisample  method  will  have  endpoints  equally 

spaced  about  the  estimate  X  ,  while  L  is  designed  so  that  the  end- 

P 

points  have  estimated  significance  levels  equally  spaced  about  p.  For 
many  uses  of  confidence  intervals  the  second  property  may  well  be  the  more 
desirable. 
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3,3,  Monte  Carlo  Results. 


Some  Monte  Carlo  studies  have  been  made  which  illustrate  the  above 

points.  Samples  of  size  n  were  taken  from  a  standard  normal  distribution 

and  each  sample  was  used  to  estimate  the  95%  and  97.5%  percentage  point; 

the  sample  was  then  randomly  divided  into  c  =  10  groups,  each  with 

m  *  n/c  observations,  and  the  estimates  found  by  as  described  above. 

Results  are  given  in  Table  1.  Since  the  same  set  of  n  values  was  used 

for  both  methods,  the  resulting  estimates  are  correlated;  nevertheless 

the  larger  bias  in  the  multi-sample  estimate  can  be  seen  for  small  n. 

When  n  increases,  each  subsample  becomes  sufficiently  large  that  the 

individual  estimates  will  become  quite  accurate  and  X  loses  its  bias. 

P 

The  approximate  expected  length  of  the  confidence  interval,  given  by 
(3.3)  is  given  in  the  Table.  The  approximation  gives  good  results  for 
p  =  0.95,  but  the  estimate  appears  to  be  too  large  for  p  =  0.975.  Never¬ 
theless,  it  can  be  seen  that  there  is  little  to  choose  between  the  length 
of  the  intervals  given  by  the  two  methods.  Table  2  shows  the  actual 

A 

probability  levels  corresponding  to  the  estimates  given  by  both 

methods:  again,  the  smaller  bias  for  the  single  sample  method  is  shown, 
for  relatively  small  n.  It  is  this  smaller  bias  which  would  appear  to 
give  an  advantage  to  the  single  sample  method,  since  point  estimates  and 
not  confidence  intervals  are  usually  what  are  required  for  Monte  Carlo 
percentage  points. 

This  research  was  supported  in  part  by  the  National  Research  Council 
of  Canada  (now  the  National  Science  and  Engineering  Research  Council),  and 
by  the  U.S.  Office  of  Naval  Research,  Contract  N00014-76-C-0475,  and  the 
jthors  thank  these  agencies  for  their  support. 
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TABLE  1.  ESTIMATES  OP  PERCENTILES  OF  A  STANDARD  NORMAL  DISTRIBUTION. 
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.031 


TABLE  2. 


Actual  probability  levels  of  the  estimated  percentile 

percentile  £  of  the  N(0,1)  distribution. 

P 


or  X 

P 


for  a 


Method 

n  — - - 


500 

Multi-sample  Single- sample 

p  *  0. 975 

.9769  .9756 

1000 

.9765 

.9753 

2000 

.9757 

.9786 

5000 

.9774 

.9771 

10000 

.9740 

.9740 

500 

p  »  0.95 

.9578 

.9499 

1000 

.9540 

.9520 

2000 

.9543 

.9532 

5000 

.9526 

.9525 

10000 

.9503 

.9510 
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